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Abstract. Image compression has been a frequent topic of presentations at 
ADASS. Compression is often viewed as just a technique to fit more data into 
a smaller space. Rather, the packing of data - its "density" - affects every 
facet of local data handling, long distance data transport, and the end-to-end 
throughput of workflows. In short, compression is one aspect of proper data 
structuring. For example, with FITS tile compression the efficient representation 
of data is combined with an expressive logistical paradigm for its manipulation. 

A deeper question remains. Not just how best to represent the data, but 
which data to represent. CCDs are linear devices. What does this mean? One 
thing it does not mean is that the analog-to-digital conversion of pixels must be 
stored using linear data numbers (DN). An alternative strategy of using non- 
linear representations is presented, with one motivation being to magnify the 
efficiency of numerical compression algorithms such as Rice. 



1. Data Representation and Compression 

The Rice compression algorithm (Rice et al. 1993) is particularly familiar to as- 
tronomers in combination with the FITS Tiled Image Convention (White et al. 
2006, Seaman et al. 2007). It has been informally paired in the past with non- 
linear data representations (Nieto-Santisteban et al. 1999, Nicula et al. 2005). 
Mention of compression is often followed immediately by the word "scheme" , as 
if this branch of computer science were suitable only for black-box heuristics ar- 
rived at through a process of trial-and-error. This paper seeks to begin a process 
of layering the issue of optimal data encoding onto a more formal foundation. 

As discussed in Pence, Seaman & White (2009A,B), the compression achieved 
for an astronomical image is typically determined almost entirely by its back- 
ground noise. Any processing that quiets the background will increase the com- 
pression ratio. This is the foundation for all lossy compression algorithms - in 
effect, lossy compression combines a preprocessing step that tempers an image's 
noise characteristics with the subsequent application of a lossless algorithm. 

Janesick (2001) describes a non-linear hardware or software component 
called a "square-rooter". Instead of using a linear analog-to-digital conversion 
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when reading out a CCD, the square-root of these values acts to linearize the 
Poisson statistics that characterizes these detectors. Such a DN encoding has 
been used on several space missions to reduce bandwidth requirements for data 
transport. While technically lossy if the square-root transform occurs after the 
A/D conversion, it is equally reasonable to consider this a type of lossless en- 
coding since the transform maintains oversampling of the noise at both the high 
and low end of the dynamic range. 

Square-root encoding itself acts as a form of compression - 65,636 linear 
data levels (for 16-bit pixels) will turn into a scant 256 levels after the square- 
root. This is not nearly as dramatic when expressed in bits, since mapping 16 
bits into 8 bits corresponds to a compression ratio of just R = 2. The more 
important aspect, rather, is to linearize the noise. 

2. Variance Stabilization 

A vector of random variables (ie., an image) is said to be heteroscedastic if the 
variance depends on the signal. Many astronomical detectors are governed by 
photon shot-noise, described by Poisson statistics with the noise varying as the 
square-root of the signal. Bright pixels are noisy pixels. 

Many familiar statistical techniques such as least-squares fitting formally 
require homoscedasticity. The penalty for ignoring this requirement for a par- 
ticular purpose may range from negligible to significant, but the broader point 
is that astronomical noise models are often very non-linear. Data compression 
techniques as applied to astronomical data must take this into account. 

Techniques exist to stabilize the variance, that is, to convert a heteroscedas- 
tic data set to one that is homoscedastic. One such technique is the Anscombe 
transform (Anscombe 1948): 



The Anscombe transform will convert a Poisson sample to have (nearly) Gaus- 
sian statistics. The factor of 2 ensures a unit variance. 

Real astronomical data does not follow a pure Poisson noise model. For 
instance, each pixel on a CCD has additive Gaussian read-noise as well as Poisson 
shot-noise. The Generalized Anscombe Transform (Murtagh et al. 1995) will 
stabilize the variance in this case: 



where a = 1/gain. This can be rearranged into a form that uses familiar CCD 
terminology (with care to keep units straight between DN and e~): 



Here the quantities have their usual meaning: 

• I xy is a pixel value in analog-to-digital data numbers (DN) 

• the CCD bias is in DN 

• the CCD gain is in /DN 

• the readout noise, a rea d, is in electrons, e~ 
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3. Photon Transfer 

In this form, variance stabilization becomes very reminiscent of the CCD photon 
transfer technique (Janesick 2007). (Many of this paper's conclusions will be 
applicable to other types of astronomical detectors, particularly in the optical 
and infrared.) Gain is the slope of a CCD's mean-variance relation, but photon 
transfer permits much finer grained analysis than simply assuming a scalar gain. 

Bright pixels greatly oversample the noise . All digital devices must contend 
with quantization noise as a term like ^1/12 = 0.2887 DN (Widrow & Kollar 
2008). To avoid stairstep aliasing effects, the gain of a CCD should be kept lower 
than the read-noise, typically a value of a few. On the other hand, the high end 
of the dynamic range is governed by shot noise, scaling as the square-root of the 
signal (in e~). For a gain fixed at unity, the noise for pixels near the top of the 
16-bit range will be 256 DN, fully 2 orders of magnitude above the read-noise. 

Noise adds in quadrature - permitting the gain to be scaled with total noise: 

9ain oc yj a 2 ead + a 2 shot + a 2 FPN (4) 

This is simply another way to look at variance stabilization (see, for example, 
figure 11.19 from Janesick 2007). The goal is to tame all the sources of noise. 

Astronomical images are not simply random agglomerations of pixels, they 
result from astronomical detectors with specific characteristics and noise models. 
For CCDs (quantum yield rj = 1) this becomes: 

gain oc y '(J 2 read + rjS + (P N S) 2 (5) 

This presents a quandry, since the fixed pattern noise (FPN) dominates both the 
Gaussian read- noise and Poisson shot noise at the bright end (if not flatfielded) . 



4. Thoughts on Optimal Encoding 

The question is how to put this together into a coherent recipe. Work is ongoing, 
but broad strokes are clear. First, the quandry is only apparent, since to first 
order the signal (and thus the FPN) is negligible. Recall that the compression 
ratio, R, is determined by the noise in the background (Pence et al. 2009A,B): 

BITPIX , s 

R = (6 

Nuts + K y ' 

where (a estimated per Stoehr 2007): 

Nuts = log 2 <7 + 1.792 (7) 

Second, phenomenological schemes such as eq. 7.86 from Janesick (2001): 

DN out = ^DNn X 2 (^-BITPIX/2) (g) 

are simplified versions of the more formal functional behavior. Equation (8) is 
the same as the Anscombe transform when N ou t = 1 + bitpix/2 {eg., N ou t = 9 
for bitpix = 16) and when the 3/8 term is negligible {DNi n >~ 30). 
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Whatever functional form, a lookup table is an efficient implementation. 
The mapping DN; m =4> DN out is surjective; several inputs map to one output. 
The inverse LUT is compact as a header or table structure under the FITS Tiled 
Image Convention. Libraries like CFITSIO ( |http: / /heasarc.gsfc.nasa.gov/fitsio ) 
that support tiling will be able to use such LUTs to transparently recover the 
linear data with adequate noise sampling. For some purposes linearization is 
not needed. Square root encoding may be just what is wanted to load an image 
display, or for homoscedastic statistics {eg., principle components analysis), or 
for multiresolution techniques using wavelets (Murtagh et al. 1995). 

Some issues remain to be resolved: 

1. Since it is the background noise that matters, the low end mapping is 
key. However, in equation (3) each pixel value is corrected for the bias, 
potentially driving the radicand negative. How can this best be resolved? 

2. What is the best high end mapping for raw and flat-fielded images? 

3. When bringing the CCD noise model into alignment with the Generalized 
Anscombe Transform, it is easy to get lost between e~ and DNs (also, what 
most call the gain, Janesick calls sensitivity, the inverse gain). Effects like 
quantum yield need to be folded into the variance stabilization. 

4. The Data Compression literature {eg., Salomon 2004) is built on a foun- 
dation of Information Theory {eg., Cover & Thomas 2006). As such, it 
may be revealing to recast both the variance stabilization and CCD noise 
modeling techniques in terms of the Shannon entropy (1948). 
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